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Abstract 

A fundamental problem in computer science is to find all the common zeroes of m quadratic poly- 
nomials in n unknowns over F2. The cryptanalysis of several modern ciphers reduces to this problem. 
Up to now, the best complexity bound was reached by an exhaustive search in 41og 2 n2" operations. 
We give an algorithm that reduces the problem to a combination of exhaustive search and sparse linear 
algebra. This algorithm has several variants depending on the method used for the linear algebra step. 
Under precise algebraic assumptions on the input system, we show that the deterministic variant of our 
algorithm has complexity bounded by O(2 0Mln ) when m = n, while a probabilistic variant of the Las 
Vegas type has expected complexity (9(2 792 "). Experiments on random systems show that the algebraic 
assumptions are satisfied with probability very close to 1 . We also give a rough estimate for the actual 
threshold between our method and exhaustive search, which is as low as 200, and thus very relevant for 
cryptographic applications. 

1 Introduction 

Motivation and Problem Statement. Solving multivariate quadratic polynomial systems is a fundamental 
problem in Information Theory. Moreover, random instances seem difficult to solve. Consequently, the 
security of several multivariate cryptosystems relies on its hardness, either directly (e.g., HFE 11321 . UOV 
[28],. . . ) or indirectly (e.g., McEliece |[20l ). In some cases, systems of special types have to be solved, but 
recent proposals like the new Polly Cracker type cryptosystem [TJ rely on the hardness of solving random 
systems of equations. This motivates the study of the complexity of generic polynomial systems. A particu- 
larly important case for applications in cryptology is the Boolean case; in that case both the coefficients and 
the solutions of the system are over ¥2. The main problem to be solved is the following: 

The Boolean Multivariate Quadratic Polynomial Problem (Boolean MQ) 
Input: {f 1,... J m ) G ¥ 2 [xi,... ,x n ]' n with deg(/ ; ) = 2 for i = 1, .. . ,m. 
Question: Find - if any - all z, 6 ¥^ such that fi(z) = ■ ■ ■ = f m (z) = 0. 

Another related problem stems from the fact that in many cryptographic applications, it is sufficient to find 
at least one solution of the corresponding polynomial system (in that case a solution is the original clear 
message or is related to the secret key). For instance, the stream cipher QUAD (6j|71 relies on the iteration 
of a set of multivariate quadratic polynomials over F2 so that the security of the keystream generation is 
related to the difficulty of finding at least one solution of the Boolean MQ problem. Thus, we also consider 
the following variant of the Boolean MQ problem: 
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The Boolean Multivariate Quadratic Polynomial Satisfiability Problem (Boolean MQ SAT) 
Input: (/i, . . . ,f m ) G ¥ 2 [xi, . . . ,x„] m with deg(/ ; -) = 2 for i = 1, .. . ,m. 
Question: Find - if any - one z, €¥'2 such that fi(z) = ■ ■ ■ = f m (z) = 0. 

Testing for the existence of a solution is an NP-complete problem (it is plainly in NP and 3-SAT can be 
reduced to it [22]). Clearly, the Boolean MQ problem is at least as hard as Boolean MQ SAT, while an 
exponential complexity is achieved by exhaustive search. 

Throughout this paper, random means distributed according to the uniform distribution (given m and n, a 
random quadratic polynomial is uniformly distributed if all its coefficients are independently and uniformly 
distributed over F2). The relation between the difficulties of Boolean MQ and Boolean MQ SAT depends on 
the relative values of m and n. When m > n, the number of solutions of the algebraic system is or 1 with 
large probability and thus finding one or all solutions is very similar, while when m = n, the probability that 
a random system has at least one solution over F2 tends to 1 — 1 m 0.63 for large n ll24l . Hence if we have 
to find a least one solution of a system with m < n equations in n variables it is enough to specialize n — m 
variables randomly in F2; the resulting system has at least one solution with limit probability 0.63 and is 
easier to solve (since the number of equations and variables is only m). Consequently, in the remainder of 
this article we restrict ourselves to the case m>n. 

To the best of our knowledge, in the worst case, the best complexity bound to solve the Boolean MQ 
problem is obtained by a modified exhaustive search in 41og 2 (n)2" operations ifTOl . Being able to decrease 
significantly this complexity is a long-standing open problem and is the main goal of this article. It is crucial 
for practical applications to have sharp estimates of the asymptotic complexity: it is especially important 
in the cryptographic context where this value may have a strong impact on the sizes of the keys needed to 
reach a given level of security. 

Main results. We describe a new algorithm BooleanSolve that solves Boolean MQ for determined or 
overdetermined systems (m = an with a > 1). We show how to adapt it to solve the Boolean MQ SAT 
problem. This algorithm has deterministic and Las Vegas variants, depending on the choice of some linear 
algebra subroutines. Our main result is: 

Theorem 1. The Boolean MQ Problem is solved by Algorithm BooleanSolve. If m = n and the system 
fulfills algebraic assumptions detailed in Theorem^ then this algorithm uses a number of arithmetic oper- 
ations in ¥2 that is: 

• f?(2 841 ") using the deterministic variant; 

• of expectation O(2 0J92n ) using the Las Vegas probabilistic variant. 

Recall that for a probabilistic algorithm of the Las Vegas type, the result is always correct, but the 
complexity is a random variable. Here its expectation is controlled well. 

Outline. Our algorithm is a variant of the hybrid approach by (U : we specialize the last k variables to 
all possible values, and check the consistency of the specialized overdetermined systems (/1, . . . ,f m ) in the 
remaining variables x\,... ,X£. 

This consistency check is done by searching for polynomials hi,..., h m+ e in x\, . . . ,xi such that 

h\fi H \-h m f m + h m+ ixi(l-xi)-\ \-h m+e x((l-x f ) = 1. (1) 

If such polynomials exist then obviously the system is not consistent. Given a bound d on the degrees 
of the polynomials hif and h m+ {Xi{\ — x,), the existence of the hi can be checked by linear algebra. The 
corresponding matrix is known as the Macaulay matrix in degree d. It is a matrix whose rows contain 
the coefficients of the polynomials /; and x,(l — x,) multiplied by all monomials of degree at most d — 2, 
each column corresponding to a monomial of degree at most d. Taking into account the special shape of 
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the polynomials x,(l — x,) leads to a more compact variant that we call the boolean Macaulay matrix (see 
Section [2]>. 

When linear algebra on the Macaulay matrix in degree d produces a solution of ([T]), the correspond- 
ing /ij's give a certificate of inconsistency. Otherwise, our algorithm proceeds with an exhaustive search 
in the remaining variables. In summary, our algorithm is a partial exhaustive search where the Macaulay 
matrices permit to prune branches of the search tree. The correctness of the algorithm is clear. 

The key point making the algorithm efficient is the choice of k and d. If d is large, then the cost of the 
linear algebra stage becomes high. If d is small, the matrices are small, but many branches with no solutions 
are not pruned and require an exhaustive search. This is where we use the relation between the Macaulay 
matrix and Grobner bases. We define a witness degree d w ; t , which has the property that any polynomial in a 
minimal Grobner basis of the system is obtained as a linear combination of the rows of the Macaulay matrix 
in degree d w i t . Hilbert's Nullstellensatz states that the system has no solution if and only if 1 belongs to the 
ideal generated by the polynomials, which implies that 1 is a linear combination of the rows of the Macaulay 
matrix in degree d w i t , making d w ; t an upper bound for the choice of d in ([T}. 

Our complexity estimates rely on a good control of the witness degree. For a homogeneous polynomial 
ideal, the classical Hilbert function of the degree d is the dimension of the vector space obtained as the 
quotient of the polynomials of degree d by the polynomials of degree d in the ideal. The witness degree is 
bounded by the first degree where the Hilbert function of the ideal generated by the homogenized equations 
is 0. Under the algebraic assumption of boolean semi-regularity (see Definition [7]), we obtain an explicit 
expression for the generating series of the Hilbert function, known as the Hilbert series of the ideal. From 
there, in Proposition [7} using the saddle-point method as in El[5j[3]], we show that when m = an and n — > oo, 
the witness degree behaves like d w ; t < c a n for a constant c a that we determine explicitly. Informally, boolean 
semi-regularity amounts to demanding a "sufficient" independence of the equations. In the case of infinite 
fields, a classical conjecture by [23] states that generic systems are semi-regular. In our context where the 
field is F2, we give strong experimental evidence (Section 4.1 ) that for n sufficiently large, boolean semi- 
regularity holds with probability very close to 1 for random systems. Thus, our complexity estimates for 
boolean semi-regular systems apply to a large class of systems in practice. 

Once the witness degree is controlled, the size of the Macaulay matrix depends only on the choice of k 
and the optimal choice depends on the complexity of the linear algebra stage. In the Las Vegas version of 
Algorithm BooleanSolve, we exploit the sparsity of this matrix by using a variant of Wiedemann's algo- 
rithm ll25ll (following BUI l27l l39ll ) for solving singular linear systems. In the deterministic version, we do 
not know of efficient ways to take advantage of the sparsity of the matrix, whence a slightly higher com- 
plexity bound. We can then draw conclusions and obtain a complexity estimate of the algorithm depending 
on k/n and n (Proposition [8]). The optimal value for k is ~ 0.45« in the Las Vegas setting and ~ 0.59n in 
the deterministic variant, completing the proof of our main theorem. 

The complexity analysis is especially important for practical applications in multivariate Cryptology 
based on the Boolean MQ problem, since it shows that in order to reach a security of 2 s (with s large), one 
has to construct systems of boolean quadratic equations with at least s/0.791 1 ~ 1.264s variables. 

Related works. Due to its practical importance, many algorithms have been designed to solve the MQ 
problem in a wide range of contexts. First, generic techniques for solving polynomial systems can be used. 
In particular, Grobner basis algorithms (such as Buchberger's algorithm ifTTl . F4 ifTol . F5 ifTTTL and FGLM 
HH) are well suited for this task. For instance, the F5 algorithm has broken several challenges of the HFE 
public-key cryptosystem |fT9l . In the cryptanalysis context, the XL algorithm ||29ll (which can be seen as a 
variant of Grobner basis algorithms [2]) has given rise to a large family of variants. All these techniques are 
closely related to the Macaulay matrix, introduced by QUI as a tool for elimination. In order to reduce the 
cost of linear algebra for the efficient computation of the resultant of multivariate polynomial systems, the 
idea of using Wiedemann's algorithm on the Macaulay matrix has been proposed by lfl2l : however since 
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the specificities of the Boolean case are not taken into account, the complexity of applying |[T2l to quadratic 
equations is 0(2 4n ). 

ATI propose a heuristic analysis of the FXL algorithm leading them to an upper bound O(2 875 ") for 
the complexity of solving the MQ problem over ¥2. In particular, they give an explicit formula for the 
Hilbert series of the ideal generated by the polynomials. However, the exact assumptions that have to be 
verified by the input systems are unclear. Also, similar results have been announced in 11421 Section 2.2], but 
the analysis there relies on algorithmic assumptions (e.g., row echelon form of sparse matrices in quadratic 
complexity) that are not known to hold currently. Under these assumptions, the authors show that the best 
trade-off between exhaustive search and row echelon form computations in the FXL algorithm is obtained 
by specializing 0.45« variables. This is the same value we obtain and prove with our algorithm. Also, a 
limiting behavior of the cost of the hybrid approach is obtained in [9] when the size of the finite field is big 
enough; these results are not applicable over F2. 

Other algorithms have been proposed when the system has additional structural properties. In particular, 
the Boolean MQ problem also arises in satisfiability problems, since boolean quadratic polynomials can 
be used for representing constraints. In these contexts, the systems are sparse and for such systems of 
higher degree the 2" barrier has been broken |[33l l34l ; similar results also exist for the k-S AT problem. Our 
algorithm does not exploit the extra structure induced by this type of sparsity and thus does not improve 
upon those results. 

Organization of the article. The main algorithm and the algebraic tools that are used throughout the 
article are described in Section [2] Then a complexity analysis is performed in Section [3] by studying the 
asymptotic behaviour of the witness degree and the sizes of the Macaulay matrices involved, under algebraic 
assumptions. In Section [4j we provide a conjecture and strong experimental evidence that these algebraic 
assumptions are verified with probability close to 1 for n sufficiently large. Finally, in Section[5]we propose 
an extension of the main algorithm that improves the quality of the linear filtering when n is small. We also 
show how the complexity results from Section [3] can be applied to the cryptosystem QUAD, leading to an 
evaluation of the sizes of the parameters needed to reach a given level of security. 

2 Algorithm 

Notations. Let m and n be two positive integers and let R be the ring F2IJC1,. . . ,x n ]. In the following, the 
notation Monomials(<f ) stands for the set of monomials in R of degree at most d. 

Since we are looking for solutions of the system in F2 (and not in its algebraic closure), we have to take 
into account the relations xf — xi = 0. Therefore, we consider the application (p mapping a monomial to its 

square- free part (pfJBU*?) = U'Li^f^'" 1 ^ and extended to R by linearity. 

If (/1, . . . ,f m ) G F2 [xi , . . . ,x n ] m is a system of polynomials, its homogenization is denoted by the system 
(ff> , . . . , ffi ) € F 2 [x x , . . . , x„ , h] and is defined by 

f?\x u ..., Xn ,h) = h^ fi ^. 

In the sequel, we consider the classical grevlex monomial ordering (graded reverse lexicographical), as 
defined for instance in Ifl4l §2.2, Defn. 6]. Also, if / is a polynomial, LM(/) denotes its leading monomial 
for that order. If I is an ideal, then LM(7) denotes the ideal generated by the leading monomials of all 
polynomials in I. 

2.1 Macaulay matrix 

Definition 1. Let ,f m ) be polynomials in R. The boolean Macaulay matrix in degree d (denoted by 

Macaulay(ci)) is the matrix whose rows contain the coefficients of the polynomials {(p(tfj)} where 1 < j < 
m, t is a squarefree monomial, and deg (//}•) = d. The columns correspond to the squarefree monomials in 
R of degree at most d and are ordered in descending order with respect to the grevlex ordering. The element 
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in the row corresponding to <p(tfj) and the column corresponding to the monomial m is the coefficient ofm 
in the polynomial (p{tfj). 

Note that the boolean Macaulay matrix can be obtained as a submatrix of the classical Macaulay matrix 
of the system . . . ,f m ,x\ — x\, . . . ,x% — x n ) after Gaussian reduction by the rows corresponding to the 
polynomials (xf — x\, . . . ,x„ —x n ). 

Lemma 1. Let M be the r^ac x c Mac boolean Macaulay matrix of the system (/i, . . . ,f m ) in degree d. Let 
r denote the 1 x CMac vector r = (0, ... ,0, 1). If the linear system u • M = r has a solution, then the system 
fx = ■ ■ ■ = f m = has no solution in 

Proof. If the system u • M = r has a solution, then there exists a linear combination of the rows of the 
Macaulay matrix which yields the constant polynomial 1. Therefore, 1 € (fi , . . . ,f m ,x\ — x\ , . . . ,x% — x n ). 

□ 

2.2 Witness degree 

We consider an indicator of the complexity of affine polynomial systems: the witness degree. It has the 
property that a Grobner basis of the ideal generated by the polynomials can be obtained by performing 
linear algebra on the Macaulay matrix in this degree. In particular, if the system has no solution, then the 
witness degree is closely related to the classical effective Nullstellensatz (see e.g., ll26l ). 

Definition 2. Let F = (/i, ... ,f m , x i ~ x i,- ■ ■ ,*n~ x n) be a sequence of polynomials and I = (F) the ideal it 
generates. Denote by 7<j and by 7<^ the ^i-vector spaces defined by 

I<d = {p\ />e/,deg(/?) <d}, 

J<d = {p\ 3hi,...,h m+n ,\/ie{l,...,m+n},deg(hi) <d-2, 
P = Eli hfi + LU h m+j {x)-xj)}. 

We call witness degree (d w j t ) ofF the smallest integer do such that 7<^ = 7<^ and ({LM(/) | / G ^< ( / }) = 
LM(7). 

Consider a row echelon form of the boolean Macaulay matrix in degree d of the system . . . ,f m ) of 
polynomials. Then the first nonzero element of each row corresponds to a leading monomial of an element 
of /, belonging to LM(7). For large enough d, Dickson's lemma lfl4l §2.4, Thm. 5] implies that the collection 
of those monomials up to degree d generates LM(7) and thus the polynomials corresponding to those rows 
together with {x\ — xi, . . . ,x\ — x n } form a Grobner basis of 7 with respect to the grevlex ordering. Another 
interpretation of the witness degree is that it is precisely the smallest such d. 

2.3 Algorithm 

Our algorithm is given in Algorithm[T] The general principle is to perform an exhaustive search in two steps, 
using a test of consistency of the Macaulay matrix to prune most of the branches of the second step of the 
search. 

When the system u • M = r is consistent, the corresponding branch of the searching tree is not explored. 
In that case, by Lemma [T] any solution of the linear system u • M = r can be used as a certificate that there 
exists no solution of the polynomial system f\ = ••• = /,„= in this branch. 

Proposition 1. Algorithm BooleanSolve is correct and solves the Boolean MQ problem. 

Proof. By Lemma [T] if the test in line 8 finds the linear system to be consistent, then there can be no 
solution with the given values of (a„_£+i, . ..,a„). Otherwise, the exhaustive search proceeds and cannot 
miss a solution. It is important to note that the choice of the actual value do does not have any impact on the 
correctness of the algorithm. □ 
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Algorithm 1 BooleanSolve 

Require: m,n,k 6 N such that m>n> k and f\,...,f m quadratic polynomials in F2[jq, . . . ,x n ]. 
Ensure: The set of boolean solutions of the system f\ = ■ ■ ■ = f m = 0. 
1: S:=0. 

2: do := index of the first nonpositive coefficient in the series expansion at of the rational function 

(1+0"-* 
(l-r)(l+; 2 )'"- 

3: for all (a„_fe+i , . . . ,a n ) E F| do 

4: for i from 1 to m do 

5: fi(xi,- ■ ■ ,X n -k) '■= fi( x U- ■ ■ , x n-k, a n-k+l,- ■ ■ ,O n ) £ F 2 [^i, . . . 

6: end for 

7: M := boolean Macaulay matrix of (/i, . . . ,/ m ) in degree <i(>- 

8: if the system u • M = r is inconsistent then t> r as defined in Lemma[T] 

9: T := solutions of the system (J\ = ■ ■ ■ = f m = 0) (exhaustive search). 

10: for all (h , . . . , t n - k ) £ T do 

ll: 5 := SU {(h,.. ■ ,t„-k,a n -k+ 1, • • - ,«»)}• 

12: end for 

13: end if 

14: end for 

15: return 5. 



Algorithm BooleanSolve is easily be adapted to solve the Boolean MQ SAT problem by replacing lines 
9-12 of the previous algorithm by: 



9: T := at least one solution of the system (j\ = ■ ■ ■ = f m = 0) (modified exhaustive search). 

10: ifr/0then 

ll: return {(h,..., t n - k )\{tl,...,t n -k)eT} 

12: end if 



2.4 Testing Consistency of Sparse Linear Systems 

The choice of the algorithm to test whether the sparse linear system u • M = r is consistent or not is crucial 
for the efficiency of Algorithm BooleanSolve. A simple deterministic algorithm consists in computing a 
row echelon form of the matrix: the linear system is consistent if and only if the last nonzero row of the 
row echelon form is equal to the vector r. We show in Section [3] that this is sufficient to pass below the 2" 
complexity barrier. We recall for future use the complexity of this method. 

Proposition 2 ( ll36ll . Proposition 2.11). The row echelon form of an N x M matrix over a field k can be 
computed in 0(NMr 6 ~ 2 ) arithmetic operations in k, where r is the rank of the matrix and 6 < 3 is such that 
any two nxn matrices over k can be multiplied in 0(n d ) arithmetic operations in k. 

Here, 6 = 3 is the cost of classical matrix multiplication and in this case a simple Gaussian reduction 
to row echelon form is sufficient. The best known value for 8 has been 2.376 for a long time, by a result 
of |[T3l . Recent improvements of that method by [37, 38 ] have decreased it to 2.3727, but this does not have 
a significant impact on our analysis. 

This result does not exploit the sparsity of Macaulay matrices. We do not know of an efficient determin- 
istic algorithm for row reduction that exploits this sparsity. Instead, we use an efficient Las Vegas variant of 
Wiedemann's algorithm due to ll25ll . whose specification is summarized in Algorithm TestConsistency. In 
this algorithm, the matrix A is given by two black boxes performing the operations x H> Ax and u H> A'u. The 
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complexity is expressed in terms of the number of evaluations of these black boxes, which in our context 
will each have a cost bounded by the number of nonzero coefficients of Macaulay matrices. The algorithm 
is presented in [25 ] for matrices with entries in an arbitrary field. We specialize it here in the case where 
the field is ¥2. The key ideas are a preconditioning of the matrix by multiplying it by random Toeplitz ma- 
trices and working in a suitable field extension to get access to sufficiently many points for picking random 
elements. 

Algorithm 2 TestConsistency ||25 1 

Require: • A black box for x h-> A • x, where A € K NxN . 

• A black box for u 1— > A' • u. 

• beK^ 1 . 

Ensure: • ("consistent",x) with A • x = b if the system has a solution 

• ("inconsistent",u) if the system does not have a solution, with u' • A = and u' ■ b 7^ 0, certifying 
the inconsistency. 



Proposition 3 ( B25I0 . Algorithm^determines the consistency of an N xN matrix with expected complexity 
0(NlogN) evaluations of the black boxes and 0(N 2 log 2 NloglogN) additional operations in ¥2- 

Macaulay matrices are rectangular. We therefore first make them square by padding with zeroes. The 
complexity estimate is then used with N the maximum of the row and column dimensions of the matrices. 

3 Complexity Analysis 

Algorithm BooleanSolve deals with a large number of Macaulay matrices in degree do. We first obtain 
bounds on the row and column dimensions of Macaulay matrices, as well as their number of nonzero entries, 
in terms of the degree. We then bound the witness degree by do. The complexity analysis is concluded by 
optimizing the value of the ratio k/n that governs the number of variables evaluated in the first exhaustive 
search. 

3.1 Sizes of Macaulay Matrices 

Proposition 4. Let (/i,...,/ m ) be quadratic polynomials in F2[jci, . . . ,x n ]. Denote by rMac (resp. CMao 
SMac) the number of rows (resp. columns, number of nonzero entries) of the associated boolean Macaulay 
matrix in degree d. If I < d < n/2, then 

CMac < T^c ' rMac < m (l-2x)(l-x) 
where x = d/n. 

Proof. The number of columns of the boolean Macaulay matrix is simply the number of squarefree mono- 
mials of degree at most d in n variables. The number of rows is that same number of monomials for de- 
gree d — 2, multiplied by the number m of polynomials. Finally, each row corresponding to a polynomial /, 
has a number of nonzero entries bounded by the number of squarefree monomials of degree at most 2 in n 
variables. Standard combinatorial counting thus gives 

CMac = £ (^j , r Mac = m £ (^j , s Mac < ^1 + n + ^ r Mac < n 2 r Mac , (3) 



U)' JMac<m " 2 (l-2x)(l-x)U)' (2) 
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where in the last inequality we use the fact that n>2. Now, the bounds come from a well-known inequality 
on binomial coefficients: for < d < n/2, 



d / n\ 1 fn 



1=0 



E < 



i J \—d/{n — d) \d 



Indeed, the sequence (") is increasing for < i < n/2. Factoring out vvj leaves a sum that is bounded by 

the geometric series 1 + d/(n — d) H . This gives the bound for CMac- The bound for rMac is obtained 

by evaluating this bound at d — 2, writing ( d " 2 ) as a rational function times Q) and finally bounding x{x — 

l/n)/((l-2x + 4/ra)((l-x) + l/?2))byx 2 /((l-2x)(l-x)). ' □ 

3.2 Bound on the Witness Degree of Inconsistent Systems 

First, we prove that the witness degree can be upper bounded by the so-called degree of regularity of the 
homogenized system. Here and subsequently, we call dimension of an ideal I C R the Krull dimension of 
the quotient ring R/I (see e.g., [15, §8]). 

Definition 3. The degree of regularity d reg (7) of a homogeneous ideal I of dimension is defined as the 
smallest integer d such that all homogeneous polynomials of degree d are in I. 

Proposition 5. Let F = (j\ , . . . ,f m ,x 2 — x\ , . . . ,x 2 —x n ) be a sequence of polynomials such that the system 
F = has no solution. Then the ideal generated by the homogenized system 

(A) -/f(h) Ah) Y 2_ Y , 2 



I — (f\ )••■) fm i x \ x\h,...,x n x n h 
has dimension and d w j t (F) < d re g(/^). 

Proof. By Hilbert's Nullstellensatz, the ideal / generated by F contains 1 (hence 1 is a Grobner basis 
of /). Therefore, there exists a G N \ {0} such that h a E I^ h \ Consequently, for the grevlex ordering, 
(x 2 1 ,...,x 2 n ,h a ) C LM(/W) and thus the dimension of LM(/W) is 0. As a consequence (see H14L §9.3, 
Prop. 9]), dim(/W) = dim(LM(/( /1 ))) = 0. 

Let be a minimal Grobner basis of the homogenized ideal I^ h ) for the grevlex ordering. By definition 
of the degree of regularity, there exist polynomials and £'j such that h^i 1 ) = Y,\<i< m Ll<j<n(Xj- 
Xjh)l'j. The ideal 1^ being homogeneous, it is possible to find such a combination with deg(/^- £{) < 
dreg(^' /! - ) ) ) deg((xy —Xjh)£'j) < d reg (/^) for all Evaluating this identity at h = 1 shows that 1 belongs to 
the vector space generated by the rows of the boolean Macaulay matrix in degree d reg (/W ). □ 

Next, the degree of regularity can be obtained from the classical Hilbert series. 

Definition 4. Let R^') be the ring ¥2[x\, . . . ,x n ,h], and let R^ be the vector space of homogeneous poly- 
nomials of degree d. Also, for I C R^ h ' a homogeneous ideal, let Ij denote the vector space defined by 

(h) 

Ij=R\ HI. The Hilbert function HF/ and the Hilbert series HS/ of I are defined by 



HF/(d) = dim(R { d h) /I d ), HS/(0 = £ HF,(d)t d 

i=0 



In view of the definition of the degree of regularity, if / is a zero-dimensional ideal of R^ h \ then HS/(?) 
is a polynomial of degree d reg (/) — 1. 

The next step is to obtain information on the Hilbert series for a large class of systems. To this end, we 
consider the so-called syzygy module, which describes the algebraic relations between the polynomials of a 
system. 
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Definition 5. Let (gi,-..,gi) G (R^Y be a polynomial system. A syzygy of (gi,---,gc) is a l-tuple 
(s\,. . . ,S() G (R^ Y such that £f_i s igi = 0- The set of all syzygies of(g\,..., gg) is a submodule of (Rw Y- 
The degree of a syzygy s = (s\,...,S£) is defined as deg(s) = maxi<,-<£cleg(g,j;). 

Obviously, for any such polynomial system, commutativity induces syzygies of the type 

gigj~gjgi = 0. (4) 

Moreover, for any constant a G F2 we have the relation a 2 = a, thus expanding the square of a polyno- 
mial Lagjsij/t o K x a e F2 [jci , . . . ,Xk] gives LaeN* a a^ 2a - As a consequence, for a homogeneous quadratic poly- 
nomial fi = ^i<j 1 k<n a j,k x j x k + Hi<j< n bjXjh + ch 2 e¥2[xi,...,x n ,h], we obtain the following syzygy of 
the system (f- h \x 2 — x\h, . . . ,x 2 —x n h): 

{fl h) -h 2 )ff ) + £ a hk {x 2 {x 2 -x j h)+x j h{x 2 -x k h))+ £ bjh 2 (x 2 - X jh) = 0. (5) 

l<j,k<n 1<7<« 

Definition 6. Let = (/j , . . . ,/i ,x\ — x\h, . . . ,x 2 —x n h) be a system of homogeneous quadratic poly- 
nomials over ¥2- We call trivial syzygies of F ^ and note Syztriv the module generated by the syzygies of 
types and (Q. 

Definition 7. A boolean homogeneous system (f[ h \ ■ ■ ■ ,fm) iJ called 

• boolean semi-regular in degree D if any syzygy whose degree is less than D belongs to Syztdv! 

• boolean semi -regular if it is boolean semi-regular in degree d reg ((/J , . . . ,fm\x\ — x\h, . . . ,x 2 —x n h)). 

(This notion is slightly different from the semi-regularity over ¥2 defined in SIS).) 

In the sequel we use the following notations: if S G Z[[f]] is a power series, then [5] denotes the series 
obtained by truncating S just before the index of its first nonpositive coefficient. Also, [t d ]S(t) denotes the 
coefficient of t d in 5. 

Proposition 6. Let (/} , . . . ,fm) be a boolean homogeneous system. Let Dq denote the degree of reg- 
ularity of the system (f[ , . . . ,fm\x\ — x\h, . . . ,x 2 — x n h). If the systems (f[ , ■ ■ ■ , f- h \, jf — h 2 ) and 
(f[ h \ ■ ■ ■ ,f(_i,fi ■ ) are Dq — 2 (resp. Do)-boolean semi-regular for each i G {2, . . . ,m\, then the Hilbert 
series of the homogeneous ideal (f[ , ■ ■ ■ ,fm \x 2 — X\h, . . . ,x 2 —x n h) is 



HS tt m (?) 



(l+Q" 
(i-0(i+? 2 ) m 



Proof. Let S; (resp. S[) denote the system (/j , . • . ,fj h \x 2 — x\h, . . . ,x 2 — x n h) (resp. (f[ h \. . . ,jf — 
h 2 ,x 2 — x\h, ...,x 2 — x n h)). The general framework of this proof is rather classical: we prove by induction 
on i and d that for all i < mmdd < D , HF^(d) = HF^(d) = [t d ] ^^i+t 2 y 

First, notice that a basis of the F2-vector space R/{x\ —x\h, . . . ,x 2 —x n h) is the set of monomials 6 = 
{jcf 1 • • -xfrh 1 I Si, . . . , 8 n G {0, 1},£ G N}. The generating function of this set is 

y <:deg(m) _ (1 +0" 

rte "(1-0* 
Therefore, the initialization of the recurrence comes from the relations 

f ^ (x\-x l h,...,x 2 n -x„h) (d) = [t d ] ^ for all d G N; 

\ HF <Si> (0) = H F {5;) (0) = 1 and H F {Sj) ( 1 ) = H F (s;) (1) = n + 1 for alU < m. 
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In the following, 2 < d < Dq and 1 < i < m are two integers, and we assume by induction that for all 
, j) G N 2 such that I < d or (I = d and j < i), we have 

HF W W = HF (j}) «)^ I ^ j¥ . 
Consider the following sequences 

-> *j£ 2 /(*-i + (f^-h 2 )) d -2 ^ Rf/{Si-i)d -> rJVCS/)^ -> o 



o. 



where the last arrow of each sequence is the canonical projection. Let g be in the kernel of the application 

*?V(*-i+c/f° -tf>) rf -2 -^jtfVft-i)* 

Then g/J belongs to (Sj-Oi, which implies that there exist polynomials gi,.. . h\,...,h„ such that 
(gl, . . . ,gi-i,g,hi, . . . ,h n ) is a syzygy of degree <i of the system Sj. By the boolean semi-regularity assump- 
tion, this syzygy belongs to Syztnv, and hence g G (S;-i) + (f; —h 2 }. Therefore the application Xfj is 
injective and the first sequence is exact. One can prove similarly that the second sequence is also exact. 
These exact sequences yield relations between the Hilbert functions: 

HF s; (J -2) - HFs ; _, (d) + HF Si (0 = 0, (6) 

H F Si (d - 2) - H F Jw (af) + H F s; (d) = 0. (7) 

Moreover, we have the relation 

, (1+0" , (1+0" _ r^- 2l (1+0" f o, 

[ J (i-o(i +t 2 y L J (i-o(i +f 2 v -1 (i-0(i+^ 2 v" 

Using Relations ([6]> and Q, and the induction hypothesis, we get the desired result. 

The proof is completed by showing that Do is equal to the index of the first nonpositive coefficient of 
HFs m (0- First, by definition of the degree of regularity, the coefficients [^']HFs m (0 are zero for d > Dq. 
Next, that the coefficient [t D °] p^w^gp is nonpositive follows from the following property (easily proved 
by induction on i, < i < m using (|7]-[8]>): 

" %- ( o|,Sy £ HF * (D »»- 

□ 



Putting everything together, we have obtained the following. 

Corollary 1. With the same notation as in Proposition^ if the homogenized system verifies the conditions 
of Proposition^ then the witness degree of the system 

if l > • • • ifmi^X %1 j ■ ■ ■ i x n ~ x n) 
is bounded by the degree of the polynomial HS„ m (0- 
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At this stage, it might seem that choosing the degree of V\S n -k,m for do in Algorithm BooleanSolve 
amounts to making a very strong assumption on the nature of the systems obtained by specialization followed 
by homogenization. In Section [4| we discuss experiments showing that this assumption is actually quite 
reasonable. 

Finally, in order to compute the asymptotic behavior of our complexity estimates in the next section, we 
need the following. 



Proposition 7. Let <x>\be a real number. Then, as n — 

deg(HS n r„ nl (0) ~M(a)n, 



1 1 



with M(x) := -x+ - + - \Jlx 2 - lOx- 1 + 2(x + 2)a/x(x + 2). 

Proof. We follow the approach of (H [5). We start from a representation of the coefficient as a Cauchy 
integral: 

M (1+0" 1 I (l+z) B 1 



[t \\-t)(l+t 2 ) m 2%lJ (l- z )(l +z 2)ra»l^+l*' 

where the contour is a circle centered in whose radius is smaller than 1 . We are searching for a value of d 
where this integral vanishes, for large n. We first estimate the asymptotic behaviour of the integral for fixed 
d. The integrand has the form exp(n/(z)) with 

f(z) = log(l +Z ) - W log(l +z 2 ) - Ml-z) + (</+Dlog(^ 
n n 

As n increases, the integral concentrates in the neighborhood of one or several saddle points, solutions to 
the saddle -point equation zf = 0, which rewrites 

A , 7 2 1-27 

- = yr -^Ar ~ 4tA = : + 0(1 In). (9) 

n l+z l+Z l n(l-z) 

In H, it is shown that for the contributions of saddle points to cancel out, two of them must coalesce and 
give rise to a double saddle point, given by the smallest positive double real root of the saddle-point equation, 
which is therefore such that (zf)' = 0. When n grows, the solutions of this equation tend towards the roots 
of <j>'(z) = 0. Let zq be the smallest positive real root of this equation. The saddle-point equation ([9]) then 
gives d ~ <j>(zo)n. Finally, eliminating zo using <p'(zo) = by a resultant computation yields 



d ~ ^-a + ^ + ^Y / ' 2a2 - 10a - 1 + 2 ( a + 2 )v / «(a + 2) 



n. 



□ 



Figure [T] shows the actual values of deg(HS n j an ] )/n for a = 1 /.55. Notice that this sequence converges 
rather slowly. This is due to the fact that we only take into account the first term in the asymptotic expansion 
of deg(HS„ r a n])- ^ would be possible to obtain the full asymptotic expansion using techniques similar to 
those in |4 5\. However, this would not change the asymptotic complexity of Algorithm[T] 

3.3 Complexity 

We now estimate the complexity of Algorithm BooleanSolve by going through its steps and making all 
necessary hypotheses explicit. We consider the case when the number of variables n and the number of 
polynomials m are related by m ~ an for some a > 1 and n is large. Also we assume that the ratio k/nis 
controlled by a parameter y e [0, 1], i.e., k = (1 — y)n. 
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5(H) 1 1500 

Figure 1: Comparison of deg(HS n v n /,ss\)/n (black) with its limit (red). 

The first step (lines 4 to 6 in the algorithm) is to evaluate the polynomials f\ from the polynomials fi. 
With no arithmetic operations, the polynomials fi can first be written as polynomials in (xi, . . . ,x n -k) with 
coefficients that are polynomials of degree at most 2 in x n -k+\ , ■ ■ ■ ,*n an d at most 1 in each variable. Each 
such coefficient has at most 1 +k + (*) monomials, each of which can be evaluated with at most one arith- 
metic operation. The total number of these polynomial coefficients is at most m{\ + n — k+ ("T*) )■ Thus the 
total cost of all the evaluations of the coefficients of the polynomials is at most 0(n 5 2^^^ n ). This turns 
out to be asymptotically negligible compared to the next steps. 

The next stage (line 8) of our algorithm consists in performing tests of inconsistency of the Macaulay 
matrices. 

Proposition 8. For any e > 0, OC > 1 and sufficiently large m = \ttn\, the complexity of all tests of consis- 
tency of Macaulay matrices in Algorithm BooleanSolve with parameters (m,n,k) is 

• 0(2( l ~ 7+eFa( -rt +e *> n ) in the deterministic variant; 

• of expectation 0(2^ 7+2Fa ^' +e ' n ) in the probabilistic variant, 

where y=l—k/n, F a (y) = — ylog 2 (D D (l — D) l ~ D ) with D = M{a/y), the function M as in Proposition^ 
and G the complexity of linear algebra as in Proposition^ 

A notable feature of this result is that in terms of complexity, the probabilistic variant of our algorithm 
behaves as the deterministic one where the linear algebra would be performed in quadratic complexity (i.e., 
with d = 2). 

Proof. We first estimate the size of the Macaulay matrices. By Proposition [7] the index do, which is 1 + 
deg(HS„_/t. m ) behaves asymptotically like yDn. The function M(x) is decreasing for x > 1, so that D < 
M(l) < 1/2. Thus, do < yn/2 for n sufficiently large and Proposition [4] applies with d = do, m = \an\ 
equations and n — k = yn variables. For n sufficiently large, the bound for rMac is larger than that for CMao 
since the quotient of these two bounds is m/(^ — l) 2 , which grows linearly with n. 



12 



Next, we turn to the tests of inconsistency. The previous bounds and Proposition |2] imply that the 
number of operations required for the computation of the row echelon form is 0{n\^fj ). Similarly, by 
Proposition [3] the complexity of checking the consistency of each matrix by the probabilistic method is 
0( r Mac log( r Mac) J Mac) = 0(n 4 2 log (^") ) and that bound dominates the cost of the additional operations 
in F 2 . Now, Stirling's formula implies that for any < b < a, log (™) ~ nlog(a a / (b h (a - b) a ~ h )). Set- 
ting a = y and b = yD gives the result, the extra factor being due to the exhaustive search that performs this 
consistency check 2^ times. □ 

In the cases where the linear system u • M = r is found inconsistent, then the polynomial system itself 
may be consistent and the algorithm proceeds with an exhaustive search (line 9) in a system with yn un- 
knowns. Each such search has cost 0(2^ +e ^"). As long as the number of these searches does not exceed 
0(2( l ~ 2 Y +2Fa ^" n ), the overall complexity of the algorithm is bounded by the complexity given in Proposi- 
tion [8j There can be two causes for the inconsistency of the linear system that triggers such a search: the 
existence of an actual solution with x n = a„, . . . , Jc« — yt+ 1 = a n -k+\ ; a witness degree of the specialized system 
larger than do (e.g., if the homogenized specialized system is not boolean semi-regular). We now define a 
class of systems where this does not happen too much. 

Definition 8. Let S=(f\,...,f m ) be quadratic polynomials in F 2 [x\ , ... ,x n ], < k = (1 — y)n < n, (X = m/n 
and do = 1 +deg(HS„_ J < :jm ). The system S is called y-strong semi-regular if both the set of its solutions in F2 
and the set 

{(a n - k+ i,...,a n ) € ¥ k 2 I 

dwit (/l (-"Cl ) • • • i x n-k,Qn-k+l j • • • >#n)j • • • ifm( x l, ■ ■ ■ ^n-k^n-k+\: ■ ■ ■ ^n)) > <^o} 

have cardinality at most 2^^ 2 ^ +2Fa ^", with F a as in Proposition^ 

Note that since 1 — 2y+2F a (y) is a decreasing function of 7, a y-strong semi-regular system is also 
/-strong semi-regular for any / < y. 

The first condition for a system to be y-strong semi-regular concerns its number of solutions. For boolean 
systems drawn uniformly at random, it is known that the probability that the number of boolean solutions is s 
decreases more than exponentially with s ll24l . so that the first condition is fulfilled with large probability. 
The second condition is related to the proportion of boolean semi-regular systems. We discuss this condition 
in the next section and show that it is also of large probability experimentally. Under this assumption of y- 
strong semi-regularity, we now state the complexity of the algorithm obtained by optimizing the choice of 
the number k of variables that are specialized. 

We first discuss large values of y. The function 1 — 2y+ 2F a {y) is decreasing with a and negative when 
y = 1. Thus, the first condition implies that a 1-strong semi-regular system has no solution. By continuity, 
this behavior persists for y close to 1 and actually holds for y G (0.824, 1). It also persists for smaller values 
of y and larger a. 

Corollary 2. With the same notations as in Prop. [S] when a system is y-strong semi-regular with a and 
y such that 1 — 2y+ 2F a (y) < 0, then it is inconsistent and detected by Algorithm BooleanSolve with 
parameters (m,n,0) in 0(2^ eFa ^ +e ^ n ) operations. 

The value of the exponent 6F a (\) in terms of a is plotted in Figure [2] (it corresponds to the right part of the 
plots, i.e. a > 1.82 for d = 2, a > 2.48 for d = 2.376, a > 3.64 for d = 3). 

Proof. The hypothesis implies that the system has no solution and that its witness degree is bounded by do, 
so that its absence of solution is detected by the linear algebra step in degree do- In that case, no exhaustive 
search is needed. □ 
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Exponent in the cost of all consistency tests 




1 1.5 



a 



Figure 2: Exponent of the complexity for inconsistent systems in terms of the ratio a (see Thm. [2] and 
Cor. 0) 



For smaller values of y, the algorithm requires exhaustive searches. The optimal choice of k is obtained 
by an optimization on the complexity estimate. This leads to the following complexity estimates. In the 
next section, we argue that the required strong semi-regularities are very likely in practice, so that the only 
choice left to the user is that of the linear algebra routine. 

Theorem 2. Let S = (/i , . . . , f m ) be a system of quadratic polynomials in ¥2 [x\ , . . . , x n ], with m = \an\ and 
OC > 1. Then Algorithm BoolesnSolve finds all its roots in FJJ with a number of arithmetic operations in F2 
that is 

• o{2^- QAna >) ifS is (21 a)-strong semi-regular using Gaussian elimination for the linear algebra 
step; 

• 0(2^^° 159o! )") if s is (AOoc)-strong semi-regular using computation of the row echelon form with 
Coppersmith-Winograd multiplication; 

• of expectation Q(2^ l ^°- 2ma ^ n ) ifS is (.55 a) -strong semi-regular using the probabilistic Algorithm^ 

In all cases, the value of k passed to the algorithm is \n(\ — 7)] with 7 corresponding to the strong semi- 
regularity. 

Proof. The correctness of the algorithm has already been proved in Proposition [T] Only the complexity 
remains to be proved. 

By definition of strong semi-regularity, the number of exhaustive searches that need be performed in 
line 9 of the Algorithm is 0(2^~ ly+2Fa ^ n ), each of them using 0(2^ +£ ^ n ) arithmetic operations for any e > 
0. It follows that the overall cost of these exhaustive searches is 0(2^ l ^ y+2Fa ^ +£ ^ n ); it is bounded by the 
cost of the tests of inconsistency. We now choose 7 in such a way as to minimize this cost, in terms of a. 
Direct computations lead to the following numerical results, that conclude the proof. □ 

Lemma 2. With the same notation as in Proposition^ the function 1 — 7+ dF a (y) is bounded by 
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1 - 0. 1 12a when 6 = 3 and y = 0.27a; 



• 1 - 0.159a when d = 2.376 and / = 0.40a; 

• 1 - 0.208a when 8 = 2 and y = 0.55a. 

Proof. The function 1 — /+ 9F a {y) has two parameters but its extrema can be found by reducing it to a one 
parameter function. Indeed, this function is maximal for a > 1 and y G [0, 1] when (— / + 8F a (y))/a is. 
Setting X = y/a, this is exactly —X + QF\(X), with X € [0, 1/a]. Direct computations lead to the optimal 
A's: X = min(l/a,0.27) when d = 3, X = min(l/a,0.40) when d = 2.376, X = min(l/a,0.55) when 
= 2. □ 

4 Numerical Experiments on Random Systems 

Probabilistic model. In this section, we study experimentally the behavior of Algorithm BooleanSolve 
of random quadratic systems where each coefficient is or 1 with probability 1/2. These random boolean 
quadratic systems appear naturally in Cryptology since the security of several recent cryptosystems relies 
directly on the difficulty of solving such systems (see e.g., OH). 

4.1 /-strong semi-regularity 

The goal of this section is to give experimental evidence that the assumption of /-strong semi-regularity is 
not a strong condition for random boolean systems. This is related to the notoriously difficult conjecture 
by 1231 . which states that in characteristic 0, almost all systems are semi -regular (with the meaning of 
semi-regularity given in ||5l), see also PTl . 

Consequently, we propose the following conjecture, which can be seen as a variant of Froberg's conjec- 
ture for boolean systems: 

Conjecture 1. For any a > 1 and y < 1 such that 1 — 2y + 2F a (y) > 0, the proportion of y- strong semi- 
regular systems of \an\ quadratic polynomials in ¥2 [x\ , . . . ,x n ] tends to 1 when n — > 00. 

The rest of this section is devoted to providing experiments supporting this conjecture. 

In Figure [3] we show the relation between the value of the first nonpositive coefficient of the power 
series expansion of HS^„j.„ and /-strong semi-regularity for small values of n = m (i.e. a = 1). For each n, 
the experiments are conducted on 1000 random quadratic boolean systems. For each of these systems, we 
compute the 2^ 1_ ^"T specialized systems and we count the number of specializations for which the filtering 
linear system is inconsistent. 

Four curves are represented on each chart in Figure [3] The red (resp. green) one represents the average 
(resp. maximal) number of specializations for which the linear system (step 8 of Algorithm BooleanSolve) 
is inconsistent. In contrast, the blue curve shows the upper bound on this number of specializations required 
to be /-strong semi-regular (see Definition [8]). The black curve shows the absolute value of the first nonpos- 
itive coefficient of the corresponding power series (i.e. HS^„j „). The y-axis is represented in logarithmic 
scale. The value / = 0. 1 is never used in the complexity analysis (since in Theorem[2j / > .27 for any value 
of a > 1). However, it is still interesting to study the behavior of Algorithm [T] when almost all variables are 
specialized: the filtering remains very efficient in this case, and the branches which are explored during the 
second stage of the exhaustive search correspond to those containing solutions of the system. 

Interpretation of Figure |3j First, notice that for / < 0.55 the green curve is always below the blue one 
(except for the case / = .55, n = 23), meaning that during our experiments, all randomly generated systems 
with those parameters were /-strong semi-regular. 

Next, in most curves (except / = 0.27), the average (resp. maximal) number of points where the spe- 
cialization leads to an inconsistent linear system is close to 1 (resp. 5). This can be explained by a simple 
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Figure 3: Relation between the quality of the filtering, the value of the first nonpositive coefficient of 
HSi™,j „, and /-strong semi -regularity. In red (resp. green), the average (resp. maximum) number of spe- 
cializations for which the linear system is inconsister^. In blue, the bound for /-strong regularity. Dashed 
line: absolute value of the first non positive coefficient of HSi™i „. 



Poisson model. Indeed, the number of solutions of a random boolean system with as many equations as 
unknowns follows a Poisson law with parameter 1 (see ll24lD . Therefore, the expectation of the number of 
solutions is 1. The expectation of the maximum of the number of solutions of 1000 random systems is then 
given as the maximum of 1000 iid random variables Pi,... ,Piooo following a Poisson law of parameter 1: 

k-i 




E(max(P 1 ,...,Aooo)) = EM (^T^) 1000 -^ 1 !^) 1000 ) ^ 5 " 51 ' 



which explains very well the observed behaviour. 

This means that during Algorithm [T] with these parameters, almost all specializations giving rise to an 
inconsistent system correspond to a branch of the exhaustive search which contains an actual solution of the 
system. Therefore, the filtering is very efficient for those parameters. 

Few specializations. In the case y = 0.9, the blue curve has a negative slope. This is due to the fact that the 
quantity 1 — 2y+ 2F a {y) (see Definition [8]) is negative for a = 1 and y > 0.82308. Therefore, we cannot 
expect that a large proportion of boolean systems are /-strong semi-regular in this setting. A limit case is 
investigated in the chart corresponding to 7 = 0.81. There, 1 — 2y + 2F a {y) « 0.0102 is positive but very 
close to zero. Experiments show that random boolean systems with these parameters and 10 < n < 24 are 
/-strong semi-regular with probability approximately equal to 0.75. 

Absolute value of the first nonpositive coefficient of HSi™i „ and y-strong semi-regularity. Another 
interesting setting is 7= .55, n = 23. Here, no generated systems were 7-strong semi-regular (although all 
generated systems for n^23 were 7-strong semi-regular). As explained in Section |5TTj this is due to the fact 



that the first nonpositive coefficient of the power series expansion of HSi^j „ is equal to zero. In Section 5.2 
we show that this phenomenon can be avoided by a simple variant of the algorithm. 

A similar phenomenon happens for 7 = .27: the first nonpositive coefficient of the power series has 
small absolute value. It is an accident due to the fact that this coefficient is close to zero for n < 25 (see 
Figure [4]). On this chart, we can see clearly the relation between the absolute value of the first nonpositive 
coefficient of HS^„j „ and the number of specializations for which the consistency test fails. 

Indeed, experiments on 1000 random systems with 7 = .27 and n = 26 were conducted and in this case 
the average number of specializations for which the linear system is inconsistent is 1 . 

These experiments justify the fact that the complexity analysis conducted in Section [3] is relevant for a 
large class of boolean systems. Also, it shows that the random systems for which the filtering may not be 
efficient can be detected a priori by looking at the absolute value of the first nonpositive coefficient in the 
power series. If this value is small, we show in Section 5.2 that the quality of the filtering can be improved 
at low cost by adding redundancy. 

Figure [4] shows the evolution of the logarithm of the absolute value of the first nonpositive coefficient 
of HSi™i n . This absolute value seems to grow exponentially with n for any given 7. Since the quality 
of the filtering is related to this absolute value, these experiments suggest that the proportion of 7-strong 
semi-regular systems tends towards 1 when n grows, as formulated in Conjecture [T] 

4.2 Numerical estimates of the complexity 

When n = m and in the most favorable algorithmic case, our complexity estimate uses 7 = .55. For this 
value, we display in Figure |Tj (page 12 1 a comparison of the behaviour of deg(HS„ )/n and its limit. This 
picture shows a relatively slow convergence. Thus, for a given number n of variables it is more interesting to 
optimize 7 using the exact value of deg(HS^„j ,„) rather than a first order asymptotic estimate. In the same 
spirit, one can also use the actual values given by Eq. ([2]) for the Macaulay matrix. Thus we seek to find 7 
that minimizes the following bounds on the number of operations: 

2 (1 ~ r) V Ma cCMacmin(r M ac,CMac) e ~ 2 , /jq-j 

resp. 2( 1 -^"max(r M ac,CMac)logmax(r M ac,CMac)^Mac 
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Figure 4: Evolution of the logarithm of the absolute value of the first nonpositive coefficient of HS^„j „. 



in the deterministic (resp. probabilistic) variants, using Eq. Q with n equations, \yn\ variables and d = 
deg(HS^„j „). The corresponding values of 7 are given in Figure [5J together with the corresponding values 
of the quantities in Eq. ( fT0| ). Although these values do not take into account the constants hidden in the 
0() estimates of the complexity, they suggest the relevance of these algorithms in the cryptographic sizes: 
the threshold between exhaustive search and our algorithm with Gaussian elimination is n ~ 280, while the 
asymptotically faster Las Vegas variant starts being faster than exhaustive search for n larger than 200 and 
beats deterministic Gaussian elimination for n larger than 160. 

5 Extensions and Applications 

5.1 Adding Redundancy to Avoid Rank Defects 



We showed in Section 4.1 that when the first nonpositive coefficient of HS n _,t,f? is close to zero, then the 
linear filtering may not be as efficient as expected (for instance in the case 7 = .55, n = 23 in Figure [3]). 
Another case is shown in Figure [6] The curve 5 = shows the behavior of Algorithm [T] on random square 
systems (m = n) where k is chosen as small as possible such that the witness degree is d w ; t = 2: this is 



obtained by choosing k 



^7+8 



(that is do = 2). 



1/2+n- , 

First, we observe that specializing a uniformly distributed random quadratic polynomial P G F2 [x\ , . . . ,x n ] 
at a uniformly distributed random point in ¥\ yields a random polynomial that is also uniformly distributed 
in F2[xi, . . . ,x„-k\- We assume here that P is reduced modulo the field equations (xf — x\,. . . ,x„ —x n ). Let 
us assume first that k = 1. Then P can be rewritten as 



P(xi, ■ ■ ■ ,x n ) = x n P\ (xi,- ■ ■ +P 2 (xi, • . . ,x„_i), 

where P\ (resp. Pi) is a random polynomial following a uniform distribution on the set of reduced boolean 
polynomials of degree 1 (resp. of degree 2) in F2[jci, . . . ,x„-i]. Therefore, if a G F2 is a random variable, 
P{x\ , . . . ,x n -i, a) 6 F2 [x\ , . . . ,Xn-i] is either Pi or Pi + P2 and thus follows a uniform distribution on the set 
of reduced quadratic boolean polynomials. The extension to arbitrary k < n follows by induction. 

Consequently, in the special case do = 2 of Figure[6]the boolean Macaulay matrix of a specialized system 
will be uniformly distributed among the boolean matrices with the same dimensions. Also, due to the choice 
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Figure 5: Left: optimal values of /for the probabilistic variant (red), the deterministic variant with Gaussian 
elimination (black) and Coppersmith-Winograd matrix multiplication (blue), and their limits. Right: corre- 
sponding values of log 2 N/n, with N given by (TTOb. The green line corresponds to an exhaustive search. 
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Figure 6: Proportion of specialized quadratic systems for which the linear system (line 
9 of Algorithm jlj is consistent. Parameters: k 



1/2 + 72- 



. In red, 5 = 



(corresponding to Algorithm [I]); in green, 5 = 1 (see Algorithm [5] of Section 5.2). 
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of k, it will be roughly square. However, in ¥2, the probability that a random square matrix has full rank is 
not close to 1. An estimate of this probability can be obtained as follows. 

The probability that a random p x q boolean matrix has rank r is (see ll2Tll35l ) 

Pipqr) 2 -„ n£(2*-2')n£(2«-2J) 

p M> r >- 2 iYj-J (r-2J) 

Therefore, given a nonzero vector v G Fj and a random boolean p xq matrix M, the probability that the 
linear system u • M = v is consistent is 



p ( 2' - 1 

Q(p,q) = JL^Gw) 



2<?-l 



Direct numerical computations show that for square matrices, Q(p,p) ~ 0.61 as soon as p > 4. This 
probability corresponds to the valleys of the curve 8 = in Figure [6] Also, it can be noticed that Q(p,q) 
grows quickly with p — q. For instance, Q(p + 6,p) ~ 0.99 when p > 1. 

Consequently, it is interesting to specialize more variables than k in some cases (especially when the 
first nonpositive coefficient of (1 +*)" /((! — +t 2 ) m ) has small absolute value): doing so increases 
the difference between the dimensions of the Macaulay matrices. This does not change the correctness of 
the algorithm (nor its asymptotic complexity), but increases the effectiveness of the filtering performed by 
linear algebra. 

5.2 Improving the quality of the filtering for small values of n 

In this section, we propose an extension of Algorithm BooleanSolve which takes an extra argument 8, in 



order to avoid the behavior of the algorithm shown in Section 5.1 The main idea is to specialize k + 8 
variables, but to take only k into account for the computation of do- Consequently, the difference between 
the number of columns and the rank of the Macaulay matrix is not too small, and hence the linear filtering 
performs better. The resulting algorithm is given in Algorithm [3j 

Algorithm 3 improved BooleanSolve. 

Require: m,n,k,S 6 N such that k + 8 < n < m and f\ , . . . ,f m quadratic polynomials in F2[xi , . . . , x n ]. 
Ensure: The set of boolean solutions of the system f\ = • • • = f m = 0. 

1: 
2 



9 
10 
11 
12 
13 
14 
15 



5:=0. 

do := index of the first nonpositive coefficient in the series expansion of the rational function jjz^yu^rys - 

for all (a n _ k _ s+l ,. .. ,a n ) G ¥ k 2 +s do 
for i from 1 to m do 

fi( x l j • • • -i x n-k-S) := fi\ x li ■ ■ ■ i x n-k-8i a n-k-8+\ i ■ ■ ■ i a n)- 

end for 

M := boolean Macaulay matrix of . .. ,f m ) in degree do- 
it the system u • M = r is inconsistent then > r as defined in Lemma [T] 

T := solutions of the system (J\ = ■ ■ ■ = f m = 0) (exhaustive search). 

for all (ti,...,tn-k-s) e^do 

S := SU{(tl,.. .,t n _k_Si a n-k-S+li- ■ •)««)}• 

end for 
end if 
end for 
return 5. 
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In Figure [6| we show the role of the parameter 5 when k is chosen minimal such that do = 2: adding 
redundancy by choosing a nonzero 5 can greatly improve the quality of the filtering (in practice, choosing 
8 = 1 is sufficient). 




Figure [7] shows further experimental evidence that adding redundancy by choosing 5 = 1 permits to 
avoid problems occurring when the first nonpositive coefficient of HS,,-^,,, is close to zero. For instance, 
the peak at 7= .55, n = 23 that appeared in Figure [^disappears when 5 = 1. 

5.3 Cases with Low Degree of Regularity 

In some cases, when the boolean system is not random, the choice of do proposed in Algorithm Boolean- 
Solve may be too large. This happens for instance for systems that have inner structure, which has an 
impact on the algebraic structure of the ideal generated by the polynomials. Examples of such structure can 
be found in Cryptology, for instance with boolean systems coming for the HFE cryptosystem lf32l . as shown 
in EQ. 

For these systems, the choice of do as the index of the first non-positive coefficient of HS, ljm would be 
very pessimistic, since the Macaulay matrices in degree do would be larger than necessary. However, if 
estimates of the witness degree are known (this is the case for HFE), then do can be chosen accordingly as 
a parameter of the Algorithm BooleanSolve. 

5.4 Application in Cryptology 

Careful implementation of the algorithm will be necessary to estimate accurately the efficiency of the 
BooleanSolve algorithm. For instance a crucial operation is the Wiedemann (or block Wiedemann) al- 
gorithm; in practice, it is probably useless to work in a field extension F 2 * as requested by Proposition [3] 
Working directly over F2 and packing several elements (bits) into words may have a dramatic effect on the 
constant hidden in Theorem [2] In the following we estimate the impact of the new algorithm from the point 
of view of a user in Cryptology. In other words, if the security of a cryptosystem relies on the hardness 
of solving a polynomial system, by how much must the parameters be increased to keep the same level of 
security? 
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The stream cipher QUAD ||6l|7l enjoys a provable security argument to support its conjectured strength. 
It relies on the iteration of a set of overdetermined multivariate quadratic polynomials over F2 so that the 
security of the keystream generation is related, in the concrete security model, to the difficulty of solving 
the Boolean MQ SAT problem. A theoretical bound is used in [7] to obtain secure parameters for a given 
security bound T and a given maximal length L of the keystream sequence that can be generated with a 
pair (key, IV): for instance (see Q p. 1711), for T = 2 80 ,L = 2 40 ,k = 2 and an advantage of more than 
e = 1/ 100, the bound gives w > 331. We report in the following table various values of n depending on L, 
T and e : 



T 


L 


£ 


n 


2 8U 


2 40 


1/100 


331 


2*0 


2 12 


1/100 


253 




2 8U 


1/100 


613 


2 lbU 


2 4U 


1/100 


445 


2 lbU 


2 4U 


1/1000 


448 


2 160 


2 40 


1/10000 


467 


2 256 


2 40 


1/100 


584 


2 256 


2 80 


1/100 


758 



Security parameters for the stream cipher QUAD [7] 

Now, the question is to achieve a security bound for T = 2 256 ; what are the minimal values of m and n 
ensuring that solving the Boolean MQ SAT requires at least T bit-operations? Using the complexity analysis 
of the BooleanSolve algorithm we can derive useful lower bounds for n when m = n or m = 2n (m = 2n 
corresponds to the recommended parameters for QUAD). In the following table we report the corresponding 
values: 



Security Bound T 


2 128 


2 256 


2 512 


2 1024 


Minimal value of n when m = n 


128 


270 


576 


1202 


Minimal value of n when m = 2n 


145 


335 


738 


1580 



Comparing with exhaustive search we can see from this table that: 

• our algorithm does not improve upon exhaustive search when n is small (for instance when m = n and 
T = 2 128 that are the recommended parameters); 

• by contrast, our algorithm can take advantage of the overdeterminedness of the algebraic systems: this 
explains why the values we recommend are larger than expected when n is large and/or m/n > 1. 
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